Abstract
High-Frequency trading models are an interesting proposal in quantic finance, since its a different approach from what traditional trading and finance means; in this project we are going to dive deep into two of the most know High-Frequency trading models, from the Asset Pricing Theory: Martingale; which states that, from a market microstructure perspective, a price in the future will be exactly the same as the present price; i.e: $p_{t+1} = p_t.$
Keeping the same market microstructure perspective, we have the Roll model (1984) which is simple but useful model and this model what wants to demonstrate is that calculating the variance and the autocovariance we can model the bid-ask spread and this can be measured by:
$Spread = 2\sqrt{-cov}$ </font>
In this project we are going to dive deep into two of the most known High-Frequency Trading Models; the Asset Pricing Theory which statets that the price of a risky asset, $P_t$, from a consuming model in present and future time, under the perspective of the market microstructure, can be modeled as a stochastic process named martingale.
A martingale is a sequence of random variables (i.e., a stochastic process) for which, at a particular time (in seconds or milliseconds), in this case, from the perspective of the market microstructure, the conditional expectation of the next value in the sequence is equal to the present value, regardless of all prior values i.e.: $p_{t+1} = p_t$.
This is going to be proved using an orderbook containing about 2400 timestamps. We are going to divide and group by minutes and count occurrences for martingales and non martingales.
The second model is the Roll Model (1984) that as I said the reason of being of this model it's to generate a theorical Spread but before this is useful to keep in mind that this model equire two major assumptions:
1) The asset is traded in an informationally efficient marke
2) The probability distribution of observed price changes is stationary (at least for short intervals of, say, two months)
Given this information it is just about making some basic statistic calculous and we'll see if this model can give us a correct aproximation of the spread.
In order to run this notebook, it is necessary to have installed and/or have the requirements.txt file with the following:
import plotly.io as pio
pio.renderers.default='notebook'
import main as mn
import data as dt
import functions as fn
import visualizations as vz
orderbooks_05jul21.json:
Is an dataset that contains about 2400 orderbooks from bitfinex broker, split by dictionaries, which its primary key is a different timestamp. Every timestamp contains the following:
help(fn.experiments)
Help on function experiments in module functions:
experiments(ob_data: dict, ob_ts: list, method: str) -> pandas.core.frame.DataFrame
Function used to perform experiments with orderbook data.
arguments:
----------
ob_data: dictionary
dictionary type with the following structure:
'timestamp'
'bid_size'
'bid'
'ask'
'ask_size'
ob_ts: list
list with timestamps in string format.
method: str: 'midprice' or 'wmidprice'
string with the method that's going to be used in calculations.
Returns -> dataframe
References:
----------
[1] Martingale. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Martingale&oldid=49256
e1 = mn.e1
e1[:3]
| intervalo | total | e1 | e1_proportion | e2 | e2_proportion | |
|---|---|---|---|---|---|---|
| 0 | 0 | 41 | 27 | 0.6585 | 14 | 0.3415 |
| 1 | 1 | 40 | 32 | 0.8000 | 8 | 0.2000 |
| 2 | 2 | 39 | 27 | 0.6923 | 12 | 0.3077 |
e1.tail(3)
| intervalo | total | e1 | e1_proportion | e2 | e2_proportion | |
|---|---|---|---|---|---|---|
| 57 | 57 | 41 | 29 | 0.7073 | 12 | 0.2927 |
| 58 | 58 | 40 | 27 | 0.6750 | 13 | 0.3250 |
| 59 | 59 | 39 | 31 | 0.7949 | 8 | 0.2051 |
In order to prove if the midprices or weighted midprices follow a stochastic process, a martingale, it is necessary to prove the following statement: $p_{t+1} = p_t$.
Where
$p_{t+1} = \text{future price}$
and
$p_{t} = \text{present price}$
It is easy to prove by using a list comprehension
True_False_list = [midprice[i+1] == midprice[i] for i in range(len(midprice)-1)]
print('The average proportion of martingale is', mn.e11_mean, 'and for no-martingales:', mn.e12_mean)
The average proportion of martingale is 0.73 and for no-martingales: 0.27
e2 = mn.e2
e2[:3]
| intervalo | total | e1 | e1_proportion | e2 | e2_proportion | |
|---|---|---|---|---|---|---|
| 0 | 0 | 41 | 27 | 0.6585 | 14 | 0.3415 |
| 1 | 1 | 40 | 27 | 0.6750 | 13 | 0.3250 |
| 2 | 2 | 39 | 26 | 0.6667 | 13 | 0.3333 |
e2.tail(3)
| intervalo | total | e1 | e1_proportion | e2 | e2_proportion | |
|---|---|---|---|---|---|---|
| 57 | 57 | 41 | 27 | 0.6585 | 14 | 0.3415 |
| 58 | 58 | 40 | 27 | 0.6750 | 13 | 0.3250 |
| 59 | 59 | 39 | 26 | 0.6667 | 13 | 0.3333 |
print('The average proportion of martingale is', mn.e21_mean, 'and for no-martingales:', mn.e22_mean)
The average proportion of martingale is 0.68 and for no-martingales: 0.32
The proportions change within the midprice and weighted midprice, but the martingale proportions is sill the highest one
To get the first parameter was necessary to first of anything get the differencess of the prices, and keep them in a list, therefore we get with the numpy library the variance of these values
print('the first parameter of the model is the variance, which is:',fn.roll_model(dt.data)['Final_Parameters'][0])
the first parameter of the model is the variance, which is: 8.42555592321929
To get the second value instead of obtaining the variance, the calculous that I have to follow is the autocovariance of order one
print('the second parameter of the model is the variance, which is:',fn.roll_model(dt.data)['Final_Parameters'][1])
the second parameter of the model is the variance, which is: -0.0012275561298996467
$Var(\Delta P_t) = 2C^{2} + \sigma_u^{2}$
$Cov(\Delta P_{t-t} \Delta P_t) = -C^{2}$
help(vz.exp1_plot)
Help on function exp1_plot in module visualizations:
exp1_plot(df: pandas.core.frame.DataFrame, x: str, y: str) -> 'stackedbarplot'
Function used to plot a stacked bar for experiment 1
arguments:
----------
df: DataFrame
DataFrame containing results from a martingale analysis
x: str
x-axis (0-59 minutes)
y: str
y-axis (martingales and non martingales)
Returns -> stacked barplot
e1_plot = vz.exp1_plot(df = e1, x = 'intervalo', y = ['e1', 'e2'])
The result is consistent in the whole hour, but less consistent than the weighted midprice result
e2_plot = vz.exp1_plot(df = e2, x = 'intervalo', y = ['e1', 'e2'])
The result is consistent in the whole hour.
By observing the two results, we can conclude than in a microstructure market perspective, about 70% of the data will follow an stochastic process: a martingale.
vz.plot_roll(dt.data, vz.df_roll, 'observed', True)
As we can see, this is a time series which is conformated by de midprice and the observed ask and bid, and there is a minimun spread between each line
vz.plot_roll(dt.data, vz.df_roll, 'theorical', True)
The difference between this plot and the previous one is that we can barely see the difference between the time series, and this is because we got a super little value fot the theorical Spread and what it makes is that the space between lines is gonna be that little.
APT
The APT model is kind of abstract, it has many concepts and may seem kind of advanced and difficult to understand, but when diving a little more deep into its foundations and focusing on them, it is easy to understang its porpuse or what it explicit means:
The best that we can expect in the future its what already is happening in the present; I think it has much more sense from the market microstructure perspective, i.e. when time intervals are very small. As discovered in this project, about 70% a price, in the market microstructure perspective, will be a martingale.
Roll
This model does not need too much metrics or calculous but wha was diffcult was the part of identifying the main point, and we were able to detect that the model has too much differences while comparing with the observed values, I can say that the model didnt was correct and there may be done lots of implementations and corrections if we want the model to be enhanced.
[1] Munnoz, 2020. Python project template. https://github.com/iffranciscome/python-project. (2021).
[2] Martingale. Encyclopedia of Mathematics. URL: http://encyclopediaofmath.org/index.php?title=Martingale&oldid=49256